{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1e636636",
   "metadata": {},
   "source": [
    "## Demo notebook for New Zealand Wildlife Thermal Imaging\n",
    "\n",
    "This notebook demonstrates basic interaction with video clips from the [New Zealand Wildlife Thermal Imaging ](https://lila.science/datasets/new-zealand-wildlife-thermal-imaging/) dataset, which contains ~120k thermal video clips.  Data was provided by the [Cacophony Project](https://cacophony.org.nz/).\n",
    "\n",
    "This notebook assumes you've downloaded the metadata and all the videos; tell the notebook where you put them in the next cell.  You only need the HDF files if you want to run the last cell."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f65c1f8",
   "metadata": {},
   "source": [
    "### Constants and imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "768c3d81",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import json\n",
    "import random\n",
    "\n",
    "from collections import defaultdict\n",
    "from tqdm import tqdm\n",
    "from IPython.display import Video\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "random.seed(0)\n",
    "\n",
    "# Change this to point to the folder where you placed all the data\n",
    "base_folder = os.path.expanduser('~/tmp/new-zealand-wildlife-thermal-imaging')\n",
    "\n",
    "# The HDF files are only required if you want to run the last cell, but if you downloaded\n",
    "# them, change this to point to the folder where you placed the HDF files.\n",
    "hdf_folder = '/bigdata/home/sftp/cacophony-ferraro_/data/cacophony-thermal/'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1f8a946",
   "metadata": {},
   "source": [
    "### Make sure all the files are where we expect them to be"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "daca36eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "main_metadata_filename = os.path.join(base_folder,'new-zealand-wildlife-thermal-imaging.json')\n",
    "individual_metadata_folder = os.path.join(base_folder,'individual-metadata')\n",
    "video_folder = os.path.join(base_folder,'videos')\n",
    "\n",
    "assert os.path.isdir(base_folder)\n",
    "assert os.path.isdir(video_folder)\n",
    "assert os.path.isdir(individual_metadata_folder)\n",
    "assert os.path.isfile(main_metadata_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be11047e",
   "metadata": {},
   "source": [
    "### Open the metadata file, print some statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "772796f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(main_metadata_filename,'r') as f:\n",
    "    metadata = json.load(f)\n",
    "\n",
    "# Print basic info about the dataset    \n",
    "for k in metadata['info']:\n",
    "    print('{}: {}'.format(k,metadata['info'][k]))\n",
    "    \n",
    "# From now on, we only care about the 'clips' field    \n",
    "all_clip_metadata = metadata['clips']\n",
    "print('Read metadata for {} clips\\n'.format(len(all_clip_metadata)))\n",
    "\n",
    "# Count the number of videos associated with each label    \n",
    "label_to_video_count = defaultdict(int)\n",
    "\n",
    "for clip_metadata in tqdm(all_clip_metadata,file=sys.stdout):\n",
    "    \n",
    "    if clip_metadata['error'] is not None:\n",
    "        continue\n",
    "     \n",
    "    # This will be a list of labels attached to this video\n",
    "    labels_this_clip = set()\n",
    "    \n",
    "    # Labels are actually assigned to *track* (individual moving blobs), so\n",
    "    # look at all the tracks that were labeled on this video.\n",
    "    for track_info in clip_metadata['tracks']:\n",
    "        for tag in track_info['tags']:\n",
    "            tag_label = tag['label']\n",
    "            labels_this_clip.add(tag_label)\n",
    "    \n",
    "    for label in labels_this_clip:\n",
    "        label_to_video_count[label] += 1\n",
    "              \n",
    "# Print labels in descending order of frequency            \n",
    "label_to_video_count = {k: v for k, v in sorted(label_to_video_count.items(), \n",
    "                                                key=lambda item: item[1], reverse=True)}\n",
    "\n",
    "print('\\nLabels:\\n')\n",
    "\n",
    "for label in label_to_video_count:\n",
    "    print('{}: {}'.format(label,label_to_video_count[label]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e66a6f1",
   "metadata": {},
   "source": [
    "## Choose a clip to tinker with\n",
    "\n",
    "Run this cell and subsequent cells multiple times to tinker with multiple videos.  By default we fix the random seed in the first cell of this notebook, so you will get the same sequence every time you run this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d089ac1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# By default, pick a clip at random\n",
    "i_clip = random.randint(0,len(all_clip_metadata)-1)\n",
    "\n",
    "# If you want to pick a specific clip by ID...\n",
    "if False:\n",
    "    def find_clip(clip_id):\n",
    "        return [c['id'] for c in all_clip_metadata].index(clip_id)    \n",
    "    i_clip = find_clip(409532)\n",
    "    \n",
    "clip = all_clip_metadata[i_clip]\n",
    "assert clip['error'] is None, 'Oops, you had the bad luck of randomly choosing an invalid clip'\n",
    "\n",
    "print('Tinkering with clip {}:\\n'.format(i_clip))\n",
    "\n",
    "# Print information about this clip\n",
    "for k in clip:\n",
    "    if k == 'tracks':\n",
    "        print('Tracks:')\n",
    "        for track in clip['tracks']:\n",
    "            print('  ' + str(track))\n",
    "    elif k == 'calibration_frames':\n",
    "        print('N calibration frame: {}'.format(len(clip['calibration_frames'])))\n",
    "    else:\n",
    "        print('{}: {}'.format(k,clip[k]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39232a97",
   "metadata": {},
   "source": [
    "### Play the videos file associated with this clip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05bc90c7",
   "metadata": {},
   "source": [
    "#### First the normalized (but not background-corrected) video"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ae85568",
   "metadata": {},
   "outputs": [],
   "source": [
    "video_filename = os.path.join(video_folder,clip['video_filename'])\n",
    "assert os.path.isfile(video_filename)\n",
    "\n",
    "print('Playing non-background-corrected video ({}):'.format(video_filename))\n",
    "Video(video_filename,embed=True,width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ba19c7e",
   "metadata": {},
   "source": [
    "### Now the background-corrected video"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d401961",
   "metadata": {},
   "outputs": [],
   "source": [
    "filtered_video_filename = os.path.join(video_folder,clip['filtered_video_filename'])\n",
    "assert os.path.isfile(filtered_video_filename)\n",
    "\n",
    "print('Playing background-corrected video ({}):'.format(filtered_video_filename))\n",
    "Video(filtered_video_filename,embed=True,width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4e0fd8",
   "metadata": {},
   "source": [
    "### Render the tracks associated with this clip\n",
    "\n",
    "These should match whatever moving blobs you just saw in the video.  These are not stored in the main metadata file (because they make it very large, and you don't *need* this information for most things you might do with this dataset), so we need to load the metadata file for this specific file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e853de95",
   "metadata": {},
   "outputs": [],
   "source": [
    "clip_metadata_filename = clip['metadata_filename']\n",
    "with open(os.path.join(individual_metadata_folder,clip_metadata_filename)) as f:\n",
    "    clip_metadata_with_position = json.load(f)\n",
    "\n",
    "# Print some basic metadata about each track, and plot them\n",
    "for track in clip_metadata_with_position['tracks']:\n",
    "    for k in track:\n",
    "        if k == 'points':\n",
    "            print('{} points'.format(len(track['points'])))\n",
    "        else:\n",
    "            print('{}: {}'.format(k,track[k]))\n",
    "\n",
    "# A list of (x[],y[]) tracks\n",
    "track_coords = []\n",
    "for track in clip_metadata_with_position['tracks']:\n",
    "    # These are organized as x/y/frame    \n",
    "    points = track['points']\n",
    "    x = [p[0] for p in points]\n",
    "    y = [(clip['height'] - p[1]) for p in points]\n",
    "    track_coords.append((x,y))\n",
    "    \n",
    "import matplotlib.pyplot as pl\n",
    "for t in track_coords:\n",
    "    x = t[0]\n",
    "    y = t[1]\n",
    "    plt.plot(x,y,'.')\n",
    "plt.xlim([0, clip['width']])\n",
    "plt.ylim([0, clip['height']])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65179e91",
   "metadata": {},
   "source": [
    "### Explore the HDF file associated with this clip\n",
    "\n",
    "It's unlikely you'll need anything from the HDF file if you're doing Machine Learning Stuff (tm) with this dataset, but just in case..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05b66ec9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import h5py\n",
    "import numpy as np\n",
    "\n",
    "h5f = h5py.File(os.path.join(hdf_folder,clip['hdf_filename']),'r')\n",
    "\n",
    "# Print the metadata in the HDF file\n",
    "clip_attrs = h5f.attrs\n",
    "for k in clip_attrs:\n",
    "    if isinstance(clip_attrs[k],np.ndarray):\n",
    "        print('{}: array of length {}'.format(k,len(clip_attrs[k])))\n",
    "    else:\n",
    "        print('{}: {}'.format(k,clip_attrs[k]))\n",
    "        \n",
    "# This is where the actual frames are, as an array of size nframes,x,y\n",
    "thermal_frames = h5f['frames']['thermals']\n",
    "\n",
    "assert clip_attrs.get('num_frames') == thermal_frames.shape[0]\n",
    "\n",
    "# Show an image of the first frame\n",
    "thermal_frames_array = np.array(thermal_frames)\n",
    "sample_frame = thermal_frames_array[0]\n",
    "plt.imshow(sample_frame,cmap='jet')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:cameratraps-detector] *",
   "language": "python",
   "name": "conda-env-cameratraps-detector-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
